Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 3000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 234.4 KiB |
| Average record size in memory | 80.0 B |
Variable types
| Numeric | 9 |
|---|
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with median_income | High correlation |
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with median_income | High correlation |
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude and 1 other fields | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with latitude and 1 other fields | High correlation |
Reproduction
| Analysis started | 2022-10-11 10:01:19.749428 |
|---|---|
| Analysis finished | 2022-10-11 10:01:38.503267 |
| Duration | 18.75 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 607 |
|---|---|
| Distinct (%) | 20.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -119.5892 |
| Minimum | -124.18 |
|---|---|
| Maximum | -114.49 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 3000 |
| Negative (%) | 100.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | -124.18 |
|---|---|
| 5-th percentile | -122.47 |
| Q1 | -121.81 |
| median | -118.485 |
| Q3 | -118.02 |
| 95-th percentile | -117.1 |
| Maximum | -114.49 |
| Range | 9.69 |
| Interquartile range (IQR) | 3.79 |
Descriptive statistics
| Standard deviation | 1.994936294 |
|---|---|
| Coefficient of variation (CV) | -0.01668157571 |
| Kurtosis | -1.36277166 |
| Mean | -119.5892 |
| Median Absolute Deviation (MAD) | 1.275 |
| Skewness | -0.2978576326 |
| Sum | -358767.6 |
| Variance | 3.979770817 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -118.26 | 26 | 0.9% |
| -118.21 | 26 | 0.9% |
| -118.28 | 25 | 0.8% |
| -118.27 | 25 | 0.8% |
| -118.29 | 25 | 0.8% |
| -118.3 | 24 | 0.8% |
| -118.14 | 23 | 0.8% |
| -118.35 | 22 | 0.7% |
| -118.31 | 21 | 0.7% |
| -118.02 | 21 | 0.7% |
| Other values (597) | 2762 |
| Value | Count | Frequency (%) |
| -124.18 | 1 | < 0.1% |
| -124.17 | 1 | < 0.1% |
| -124.16 | 4 | |
| -124.15 | 1 | < 0.1% |
| -124.14 | 3 | |
| -124.1 | 1 | < 0.1% |
| -124.09 | 2 | |
| -124.01 | 1 | < 0.1% |
| -123.92 | 1 | < 0.1% |
| -123.85 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| -114.49 | 1 | < 0.1% |
| -114.55 | 1 | < 0.1% |
| -114.61 | 1 | < 0.1% |
| -114.62 | 1 | < 0.1% |
| -114.98 | 1 | < 0.1% |
| -115.49 | 1 | < 0.1% |
| -115.52 | 1 | < 0.1% |
| -115.56 | 1 | < 0.1% |
| -115.57 | 4 | |
| -115.59 | 1 | < 0.1% |
| Distinct | 587 |
|---|---|
| Distinct (%) | 19.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.63539 |
| Minimum | 32.56 |
|---|---|
| Maximum | 41.92 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | 32.56 |
|---|---|
| 5-th percentile | 32.82 |
| Q1 | 33.93 |
| median | 34.27 |
| Q3 | 37.69 |
| 95-th percentile | 38.97 |
| Maximum | 41.92 |
| Range | 9.36 |
| Interquartile range (IQR) | 3.76 |
Descriptive statistics
| Standard deviation | 2.129669523 |
|---|---|
| Coefficient of variation (CV) | 0.05976276739 |
| Kurtosis | -1.12437247 |
| Mean | 35.63539 |
| Median Absolute Deviation (MAD) | 1.25 |
| Skewness | 0.4598159368 |
| Sum | 106906.17 |
| Variance | 4.535492279 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 34.02 | 35 | 1.2% |
| 34.06 | 33 | 1.1% |
| 34.05 | 32 | 1.1% |
| 34.09 | 31 | 1.0% |
| 34.11 | 31 | 1.0% |
| 34.07 | 31 | 1.0% |
| 33.93 | 30 | 1.0% |
| 33.91 | 30 | 1.0% |
| 33.84 | 28 | 0.9% |
| 33.97 | 27 | 0.9% |
| Other values (577) | 2692 |
| Value | Count | Frequency (%) |
| 32.56 | 1 | < 0.1% |
| 32.57 | 3 | |
| 32.58 | 6 | |
| 32.59 | 2 | 0.1% |
| 32.6 | 1 | < 0.1% |
| 32.61 | 4 | |
| 32.62 | 2 | 0.1% |
| 32.64 | 2 | 0.1% |
| 32.66 | 3 | |
| 32.67 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 41.92 | 1 | |
| 41.8 | 1 | |
| 41.63 | 1 | |
| 41.54 | 1 | |
| 41.31 | 1 | |
| 41.28 | 1 | |
| 41.23 | 1 | |
| 41.2 | 1 | |
| 41.01 | 1 | |
| 40.99 | 1 |
housing_median_age
Real number (ℝ≥0)
| Distinct | 52 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.84533333 |
| Minimum | 1 |
|---|---|
| Maximum | 52 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 18 |
| median | 29 |
| Q3 | 37 |
| 95-th percentile | 52 |
| Maximum | 52 |
| Range | 51 |
| Interquartile range (IQR) | 19 |
Descriptive statistics
| Standard deviation | 12.55539555 |
|---|---|
| Coefficient of variation (CV) | 0.4352660935 |
| Kurtosis | -0.8037837284 |
| Mean | 28.84533333 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 0.01851312116 |
| Sum | 86536 |
| Variance | 157.6379575 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 52 | 173 | 5.8% |
| 35 | 118 | 3.9% |
| 36 | 115 | 3.8% |
| 16 | 107 | 3.6% |
| 34 | 102 | 3.4% |
| 17 | 100 | 3.3% |
| 32 | 91 | 3.0% |
| 26 | 88 | 2.9% |
| 37 | 88 | 2.9% |
| 25 | 86 | 2.9% |
| Other values (42) | 1932 |
| Value | Count | Frequency (%) |
| 1 | 2 | 0.1% |
| 2 | 6 | 0.2% |
| 3 | 12 | 0.4% |
| 4 | 28 | |
| 5 | 39 | |
| 6 | 25 | |
| 7 | 20 | |
| 8 | 25 | |
| 9 | 27 | |
| 10 | 30 |
| Value | Count | Frequency (%) |
| 52 | 173 | |
| 51 | 11 | 0.4% |
| 50 | 16 | 0.5% |
| 49 | 21 | 0.7% |
| 48 | 34 | 1.1% |
| 47 | 22 | 0.7% |
| 46 | 41 | 1.4% |
| 45 | 51 | 1.7% |
| 44 | 51 | 1.7% |
| 43 | 56 | 1.9% |
| Distinct | 2215 |
|---|---|
| Distinct (%) | 73.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2599.578667 |
| Minimum | 6 |
|---|---|
| Maximum | 30450 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 585.9 |
| Q1 | 1401 |
| median | 2106 |
| Q3 | 3129 |
| 95-th percentile | 6016.45 |
| Maximum | 30450 |
| Range | 30444 |
| Interquartile range (IQR) | 1728 |
Descriptive statistics
| Standard deviation | 2155.593332 |
|---|---|
| Coefficient of variation (CV) | 0.8292087327 |
| Kurtosis | 32.09994094 |
| Mean | 2599.578667 |
| Median Absolute Deviation (MAD) | 815.5 |
| Skewness | 4.167637359 |
| Sum | 7798736 |
| Variance | 4646582.611 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1778 | 5 | 0.2% |
| 1966 | 5 | 0.2% |
| 907 | 5 | 0.2% |
| 1787 | 5 | 0.2% |
| 2127 | 4 | 0.1% |
| 1564 | 4 | 0.1% |
| 1005 | 4 | 0.1% |
| 1499 | 4 | 0.1% |
| 1531 | 4 | 0.1% |
| 2914 | 4 | 0.1% |
| Other values (2205) | 2956 |
| Value | Count | Frequency (%) |
| 6 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 19 | 1 | |
| 21 | 1 | |
| 25 | 1 | |
| 32 | 2 | |
| 38 | 1 | |
| 40 | 1 | |
| 41 | 1 |
| Value | Count | Frequency (%) |
| 30450 | 1 | |
| 27870 | 1 | |
| 24121 | 1 | |
| 23915 | 1 | |
| 21988 | 1 | |
| 20354 | 1 | |
| 18132 | 1 | |
| 18123 | 1 | |
| 17470 | 1 | |
| 16590 | 1 |
total_bedrooms
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 1055 |
|---|---|
| Distinct (%) | 35.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 529.9506667 |
| Minimum | 2 |
|---|---|
| Maximum | 5419 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 130.95 |
| Q1 | 291 |
| median | 437 |
| Q3 | 636 |
| 95-th percentile | 1220.1 |
| Maximum | 5419 |
| Range | 5417 |
| Interquartile range (IQR) | 345 |
Descriptive statistics
| Standard deviation | 415.6543681 |
|---|---|
| Coefficient of variation (CV) | 0.7843265313 |
| Kurtosis | 28.53707082 |
| Mean | 529.9506667 |
| Median Absolute Deviation (MAD) | 165 |
| Skewness | 3.863393189 |
| Sum | 1589852 |
| Variance | 172768.5538 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 314 | 15 | 0.5% |
| 270 | 12 | 0.4% |
| 299 | 11 | 0.4% |
| 274 | 10 | 0.3% |
| 298 | 10 | 0.3% |
| 348 | 10 | 0.3% |
| 301 | 10 | 0.3% |
| 528 | 10 | 0.3% |
| 493 | 10 | 0.3% |
| 292 | 10 | 0.3% |
| Other values (1045) | 2892 |
| Value | Count | Frequency (%) |
| 2 | 1 | < 0.1% |
| 3 | 1 | < 0.1% |
| 4 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 7 | 2 | 0.1% |
| 8 | 5 | |
| 11 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| 14 | 3 |
| Value | Count | Frequency (%) |
| 5419 | 1 | |
| 5033 | 1 | |
| 5027 | 1 | |
| 4585 | 1 | |
| 4522 | 1 | |
| 4135 | 1 | |
| 4055 | 1 | |
| 3493 | 1 | |
| 3173 | 1 | |
| 2971 | 1 |
| Distinct | 1802 |
|---|---|
| Distinct (%) | 60.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1402.798667 |
| Minimum | 5 |
|---|---|
| Maximum | 11935 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 346.95 |
| Q1 | 780 |
| median | 1155 |
| Q3 | 1742.75 |
| 95-th percentile | 3238.3 |
| Maximum | 11935 |
| Range | 11930 |
| Interquartile range (IQR) | 962.75 |
Descriptive statistics
| Standard deviation | 1030.543012 |
|---|---|
| Coefficient of variation (CV) | 0.7346335842 |
| Kurtosis | 16.44326818 |
| Mean | 1402.798667 |
| Median Absolute Deviation (MAD) | 450 |
| Skewness | 2.949670691 |
| Sum | 4208396 |
| Variance | 1062018.9 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 870 | 7 | 0.2% |
| 753 | 6 | 0.2% |
| 697 | 6 | 0.2% |
| 881 | 6 | 0.2% |
| 1211 | 6 | 0.2% |
| 1581 | 5 | 0.2% |
| 568 | 5 | 0.2% |
| 769 | 5 | 0.2% |
| 494 | 5 | 0.2% |
| 1277 | 5 | 0.2% |
| Other values (1792) | 2944 |
| Value | Count | Frequency (%) |
| 5 | 1 | < 0.1% |
| 8 | 2 | |
| 14 | 2 | |
| 19 | 1 | < 0.1% |
| 21 | 1 | < 0.1% |
| 22 | 1 | < 0.1% |
| 25 | 1 | < 0.1% |
| 26 | 1 | < 0.1% |
| 27 | 3 | |
| 29 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 11935 | 1 | |
| 11139 | 1 | |
| 10877 | 1 | |
| 9419 | 1 | |
| 8824 | 1 | |
| 8768 | 1 | |
| 8152 | 1 | |
| 7604 | 1 | |
| 7596 | 1 | |
| 7560 | 1 |
| Distinct | 1026 |
|---|---|
| Distinct (%) | 34.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 489.912 |
| Minimum | 2 |
|---|---|
| Maximum | 4930 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 122.95 |
| Q1 | 273 |
| median | 409.5 |
| Q3 | 597.25 |
| 95-th percentile | 1113 |
| Maximum | 4930 |
| Range | 4928 |
| Interquartile range (IQR) | 324.25 |
Descriptive statistics
| Standard deviation | 365.4227098 |
|---|---|
| Coefficient of variation (CV) | 0.7458945888 |
| Kurtosis | 26.22936135 |
| Mean | 489.912 |
| Median Absolute Deviation (MAD) | 153.5 |
| Skewness | 3.559753412 |
| Sum | 1469736 |
| Variance | 133533.7568 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 375 | 12 | 0.4% |
| 273 | 12 | 0.4% |
| 614 | 12 | 0.4% |
| 340 | 11 | 0.4% |
| 363 | 11 | 0.4% |
| 429 | 11 | 0.4% |
| 456 | 11 | 0.4% |
| 335 | 11 | 0.4% |
| 239 | 11 | 0.4% |
| 287 | 11 | 0.4% |
| Other values (1016) | 2887 |
| Value | Count | Frequency (%) |
| 2 | 1 | < 0.1% |
| 3 | 2 | 0.1% |
| 7 | 2 | 0.1% |
| 8 | 2 | 0.1% |
| 9 | 5 | |
| 10 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| 14 | 3 |
| Value | Count | Frequency (%) |
| 4930 | 1 | |
| 4855 | 1 | |
| 4176 | 1 | |
| 3958 | 1 | |
| 3293 | 1 | |
| 3252 | 1 | |
| 3197 | 1 | |
| 2964 | 1 | |
| 2651 | 1 | |
| 2392 | 1 |
| Distinct | 2578 |
|---|---|
| Distinct (%) | 85.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.8072718 |
| Minimum | 0.4999 |
|---|---|
| Maximum | 15.0001 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | 0.4999 |
|---|---|
| 5-th percentile | 1.56239 |
| Q1 | 2.544 |
| median | 3.48715 |
| Q3 | 4.656475 |
| 95-th percentile | 6.97549 |
| Maximum | 15.0001 |
| Range | 14.5002 |
| Interquartile range (IQR) | 2.112475 |
Descriptive statistics
| Standard deviation | 1.85451173 |
|---|---|
| Coefficient of variation (CV) | 0.4870972778 |
| Kurtosis | 5.626184149 |
| Mean | 3.8072718 |
| Median Absolute Deviation (MAD) | 1.02845 |
| Skewness | 1.698511735 |
| Sum | 11421.8154 |
| Variance | 3.439213756 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 15.0001 | 9 | 0.3% |
| 4 | 8 | 0.3% |
| 3.375 | 8 | 0.3% |
| 2.125 | 7 | 0.2% |
| 3.875 | 7 | 0.2% |
| 3.25 | 7 | 0.2% |
| 2.75 | 7 | 0.2% |
| 2.375 | 6 | 0.2% |
| 3.6875 | 6 | 0.2% |
| 3.625 | 6 | 0.2% |
| Other values (2568) | 2929 |
| Value | Count | Frequency (%) |
| 0.4999 | 1 | < 0.1% |
| 0.536 | 3 | |
| 0.5495 | 1 | < 0.1% |
| 0.7054 | 1 | < 0.1% |
| 0.7403 | 1 | < 0.1% |
| 0.75 | 1 | < 0.1% |
| 0.8054 | 1 | < 0.1% |
| 0.8185 | 1 | < 0.1% |
| 0.8252 | 1 | < 0.1% |
| 0.844 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 15.0001 | 9 | |
| 14.2867 | 1 | < 0.1% |
| 13.6623 | 1 | < 0.1% |
| 12.8763 | 1 | < 0.1% |
| 12.6417 | 1 | < 0.1% |
| 12.3767 | 1 | < 0.1% |
| 11.806 | 1 | < 0.1% |
| 11.7794 | 1 | < 0.1% |
| 11.5706 | 1 | < 0.1% |
| 11.1978 | 1 | < 0.1% |
| Distinct | 1784 |
|---|---|
| Distinct (%) | 59.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 205846.275 |
| Minimum | 22500 |
|---|---|
| Maximum | 500001 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 46.9 KiB |
Quantile statistics
| Minimum | 22500 |
|---|---|
| 5-th percentile | 67785 |
| Q1 | 121200 |
| median | 177650 |
| Q3 | 263975 |
| 95-th percentile | 465640 |
| Maximum | 500001 |
| Range | 477501 |
| Interquartile range (IQR) | 142775 |
Descriptive statistics
| Standard deviation | 113119.6875 |
|---|---|
| Coefficient of variation (CV) | 0.5495347801 |
| Kurtosis | 0.3953989964 |
| Mean | 205846.275 |
| Median Absolute Deviation (MAD) | 68000 |
| Skewness | 0.9895619132 |
| Sum | 617538825 |
| Variance | 1.279606369 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 500001 | 125 | 4.2% |
| 137500 | 23 | 0.8% |
| 162500 | 21 | 0.7% |
| 225000 | 17 | 0.6% |
| 350000 | 14 | 0.5% |
| 87500 | 13 | 0.4% |
| 100000 | 13 | 0.4% |
| 187500 | 13 | 0.4% |
| 112500 | 12 | 0.4% |
| 150000 | 11 | 0.4% |
| Other values (1774) | 2738 |
| Value | Count | Frequency (%) |
| 22500 | 1 | |
| 37500 | 1 | |
| 39200 | 1 | |
| 39800 | 1 | |
| 40000 | 1 | |
| 41500 | 1 | |
| 42500 | 1 | |
| 42700 | 1 | |
| 43100 | 1 | |
| 43300 | 1 |
| Value | Count | Frequency (%) |
| 500001 | 125 | |
| 500000 | 4 | 0.1% |
| 495800 | 1 | < 0.1% |
| 495500 | 1 | < 0.1% |
| 494700 | 1 | < 0.1% |
| 493200 | 1 | < 0.1% |
| 492300 | 1 | < 0.1% |
| 492000 | 1 | < 0.1% |
| 489800 | 1 | < 0.1% |
| 487100 | 1 | < 0.1% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | -122.05 | 37.37 | 27.0 | 3885.0 | 661.0 | 1537.0 | 606.0 | 6.6085 | 344700.0 |
| 1 | -118.30 | 34.26 | 43.0 | 1510.0 | 310.0 | 809.0 | 277.0 | 3.5990 | 176500.0 |
| 2 | -117.81 | 33.78 | 27.0 | 3589.0 | 507.0 | 1484.0 | 495.0 | 5.7934 | 270500.0 |
| 3 | -118.36 | 33.82 | 28.0 | 67.0 | 15.0 | 49.0 | 11.0 | 6.1359 | 330000.0 |
| 4 | -119.67 | 36.33 | 19.0 | 1241.0 | 244.0 | 850.0 | 237.0 | 2.9375 | 81700.0 |
| 5 | -119.56 | 36.51 | 37.0 | 1018.0 | 213.0 | 663.0 | 204.0 | 1.6635 | 67000.0 |
| 6 | -121.43 | 38.63 | 43.0 | 1009.0 | 225.0 | 604.0 | 218.0 | 1.6641 | 67000.0 |
| 7 | -120.65 | 35.48 | 19.0 | 2310.0 | 471.0 | 1341.0 | 441.0 | 3.2250 | 166900.0 |
| 8 | -122.84 | 38.40 | 15.0 | 3080.0 | 617.0 | 1446.0 | 599.0 | 3.6696 | 194400.0 |
| 9 | -118.02 | 34.08 | 31.0 | 2402.0 | 632.0 | 2830.0 | 603.0 | 2.3333 | 164200.0 |
Last rows
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | |
|---|---|---|---|---|---|---|---|---|---|
| 2990 | -118.23 | 34.09 | 49.0 | 1638.0 | 456.0 | 1500.0 | 430.0 | 2.6923 | 150000.0 |
| 2991 | -117.17 | 34.28 | 13.0 | 4867.0 | 718.0 | 780.0 | 250.0 | 7.1997 | 253800.0 |
| 2992 | -122.33 | 37.39 | 52.0 | 573.0 | 102.0 | 232.0 | 92.0 | 6.2263 | 500001.0 |
| 2993 | -117.91 | 33.60 | 37.0 | 2088.0 | 510.0 | 673.0 | 390.0 | 5.1048 | 500001.0 |
| 2994 | -117.93 | 33.86 | 35.0 | 931.0 | 181.0 | 516.0 | 174.0 | 5.5867 | 182500.0 |
| 2995 | -119.86 | 34.42 | 23.0 | 1450.0 | 642.0 | 1258.0 | 607.0 | 1.1790 | 225000.0 |
| 2996 | -118.14 | 34.06 | 27.0 | 5257.0 | 1082.0 | 3496.0 | 1036.0 | 3.3906 | 237200.0 |
| 2997 | -119.70 | 36.30 | 10.0 | 956.0 | 201.0 | 693.0 | 220.0 | 2.2895 | 62000.0 |
| 2998 | -117.12 | 34.10 | 40.0 | 96.0 | 14.0 | 46.0 | 14.0 | 3.2708 | 162500.0 |
| 2999 | -119.63 | 34.42 | 42.0 | 1765.0 | 263.0 | 753.0 | 260.0 | 8.5608 | 500001.0 |